Wurm lab

MSc Bioinformatics

QA and Assembly

bmpvieira.com/assembly14

bmpvieira

Bruno Vieira | @bmpvieira

Phd Student @ QMUL

Bioinformatics and Population Genomics


Supervisor:
Yannick Wurm | @yannick__





© 2014 Bruno Vieira CC-BY 4.0


Download data

bit.ly/ant-reads

Useful books

Papers

De novo genome assembly: what every biologist should know

Assemblathon 2: evaluating de novo methods of genome assembly[...]


Genome Assembly




Chen 2011


Types

Algoritms

Strategies




Assembly paradigms


Overlap/Layout/Consensus


Overlap/Layout/Consensus



Chen 2011


de Brujin


de Brujin


Chen 2011




Schatz 2012




Schatz 2012


Too many assemblers

seqanswers.com/wiki/De-novo_assembly



A5, ABySS, ALLPATHS, CABOG, CLCbio, Contrail, Curtain, DecGPU, Forge, Geneious, GenoMiner, IDBA, Lasergene, MIRA, Newbler, PE-Assembler, QSRA, Ray, SeqMan NGen, SeqPrep, Sequencher, SHARCGS, SHORTY, SHRAP, SOAPdenovo, SR-ASM, SuccinctAssembly, SUTTA, Taipan, VCAKE, Velvet


Benchmarking



Why we need the assemblathon


Assembly quality assessment


Assembly quality assessment



Assembly quality assessment


N50 must die?


Assembly quality assessment


Assembly quality assessment



FastQC

FastQC Documentation




Diginorm

"(...)systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors."


Diginorm

"(...)reduces the size of shotgun data sets and decreases the memory and time requirements for de novo sequence assembly, all without significantly impacting content of the generated contigs."

Magic? No, Bloom filters


Diginorm

What is digital normalization, anyway?

Why you shouldn't use digital normalization


Fasta


Fastq


Interleaved format


Practical

bmpvieira.com/assembly14-practical



Copyright 2016 Authors. All rights reserverd.